Contents

library(dplyr)
library(ggplot2)
library(scater)
library(SpatialExperiment)
library(tidyr)

1 Summary

A brief summary of the package’s current structure is outlined below. To give some examples, we load an example SpatialExperiment containing 10X Visium spatial gene expression data of two serial mouse brain sections (Sagittal-Posterior) available here.

data(ve)

The SpatialExperiment class

Spatial data

TBW

head(spatialData(ve))
## DataFrame with 6 rows and 5 columns
##                      x_coord   y_coord in_tissue array_row array_col
##                    <integer> <integer> <logical> <integer> <integer>
## AAACAACGAATAGTTC-1      1419      2534     FALSE         0        16
## AAACAAGTATCTCCCA-1      7409      8455      TRUE        50       102
## AAACAATCTACTAGCA-1      1778      4393     FALSE         3        43
## AAACACCAATAACTGC-1      8487      2740      TRUE        59        19
## AAACAGAGCGACTCCT-1      3096      7905      TRUE        14        94
## AAACAGCTTTCAGAAG-1      6570      2052      TRUE        43         9

Image data

Image-related data are stored in the int_metadata()$imgData field as a DataFrame with the following columns:

  • sample_id and image_id specifying the image’s sample and image identifier
  • data: a list of SpatialImages containing the image’s grob, path and/or URL
  • width and height giving the image’s dimension in pixel
  • scaleFactor used to re-scale spatial coordinates according to the image’s resolution
(df <- imgData(ve))
## DataFrame with 2 rows and 6 columns
##     sample_id    image_id   data     width    height scaleFactor
##   <character> <character> <list> <integer> <integer>   <numeric>
## 1    section1      lowres               NA        NA   0.0516351
## 2    section2      lowres               NA        NA   0.0516351

Getters, setters & methods

  • scaleFactors() retrieves scale factors
  • imgData(), imgData() <- value to get/set the imgData DataFrame
  • add/removeImg() can be used to create/eliminate images in the imgData
  • load/unloadImg() can be used to load/unload images, i.e. add/remove the grob
  • the read10xVisium() constructor can be used to create a SpatialExperiment from 10x Visium spatial gene expression data in one line

1.1 The SpatialImage class

Contains three slots that store any available information associated with an image:

  • @grob: NULL or an object class rastergrob from the grid package
  • @path: NULL or a character strings specifying an image file name (.png, .jpg or .tif)
  • @url: NULL or a character string specifying an URL from which to retrieve the image
df$data[[1]]
## A SpatialImage with 1 source(s):
##       > not loaded
## grob: NA
## path: /Users/inzirio/Desktop/gDrive/works/coding/Spatial
##       Experiment/inst/extdata/10xVisium/section1/spatial
##       /tissue_lowres_image.png
##  url: NA
  • imgGrob/path/Url(), ... <- value access/set data in the respective slots
  • load/unloadImg() are used to add/drop the grob slot
  • loadImg() supports caching when loading from a URL; in general, paths are given precedence over URLs

2 The SpatialExperiment class

The SpatialExperiment class extends the SingleCellExperiment class by requiring specific fields to be present in the object’s colData and int_metadata. These aim to accommodate spatially and image related data“,”respectively.

3 Reading 10X Visium data

The 10X Genomics’ CellRanger pipeline will process the data using standard output file formats that are saved, for each sample, in a single directory /<sample>/outs/ of the following structure:

sample 
|—outs 
··|—raw/filtered_feature_bc_matrix.h5 
··|—raw/filtered_feature_bc_matrix 
····|—barcodes.tsv 
····|—features.tsv 
····|—matrix.mtx 
··|—spatial 
····|—scalefactors_json.json 
····|—tissue_lowres_image.png 
····|—tissue_positions_list.csv 

We can load these data into a SpatialExperiment using the read10xVisium function, which will read in all relevant information, including the count data, spatial coordinates, scale factors, and images:

dir <- system.file(
  file.path("extdata", "10xVisium"), 
  package = "SpatialExperiment")

sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids)

list.files(samples[1])
## [1] "raw_feature_bc_matrix.h5" "spatial"
list.files(file.path(samples[1], "spatial"))
## [1] "scalefactors_json.json"    "tissue_lowres_image.png"  
## [3] "tissue_positions_list.csv"
(ve <- read10xVisium(samples, sample_ids,
  images = "lowres", # specify which image(s) to include
  load = TRUE))      # specify whether or not to load image(s)
## class: SpatialExperiment 
## dim: 32285 9984 
## metadata(2): Samples Samples
## assays(1): counts
## rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095019 ENSMUSG00000095041
## rowData names(1): symbol
## colnames(9984): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(7): Barcode sample_id ... array_row array_col
## reducedDimNames(0):
## altExpNames(0):
## imgData(6): sample_id image_id ... height scaleFactor

4 Spatial data

Sample identifiers, indication of whether or not an observation was mapped to the tissue, as well as spatial coordinates are stored inside the colData.

# tabulate number of spots mapped to tissue
table(
  in_tissue = isInTissue(ve),
  sample_id = ve$sample_id)
##          sample_id
## in_tissue section1 section2
##     FALSE     1637     1637
##     TRUE      3355     3355
# view spatial coordinates
head(spatialData(ve))
## DataFrame with 6 rows and 5 columns
##                      x_coord   y_coord in_tissue array_row array_col
##                    <integer> <integer> <logical> <integer> <integer>
## AAACAACGAATAGTTC-1      1419      2534     FALSE         0        16
## AAACAAGTATCTCCCA-1      7409      8455      TRUE        50       102
## AAACAATCTACTAGCA-1      1778      4393     FALSE         3        43
## AAACACCAATAACTGC-1      8487      2740      TRUE        59        19
## AAACAGAGCGACTCCT-1      3096      7905      TRUE        14        94
## AAACAGCTTTCAGAAG-1      6570      2052      TRUE        43         9
head(spatialCoords(ve))
##                    x_coord y_coord
## AAACAACGAATAGTTC-1    1419    2534
## AAACAAGTATCTCCCA-1    7409    8455
## AAACAATCTACTAGCA-1    1778    4393
## AAACACCAATAACTGC-1    8487    2740
## AAACAGAGCGACTCCT-1    3096    7905
## AAACAGCTTTCAGAAG-1    6570    2052

5 Image data

All image related data are stored inside the int_metadata’s imgData field as DataFrame of the following structure:

The imgData() accessor can be used to retrieve the image data stored within the object:

imgData(ve)
## DataFrame with 2 rows and 6 columns
##     sample_id    image_id   data     width    height scaleFactor
##   <character> <character> <list> <integer> <integer>   <numeric>
## 1    section1      lowres              600       600   0.0516351
## 2    section2      lowres              600       600   0.0516351

5.1 The SpatialImage class

Images are stored inside the data field of the imgData as a list of SpatialImages, which enables storing three types of information that may be associated with an image:

  • grob: a grob object of the image
  • path: a file path from which to load the image
  • url: a URL from which to retrieve the image

Data available in an object of class SpatialImage may be accessed via the imgGrob(), imgPath() and imgUrl() accessors.

imgData(ve)$data
## [[1]]
## A SpatialImage with 2 source(s):
##       > loaded
## grob: Av
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection1/spatial/tissue_lowres_image.png
##  url: NA
## 
## [[2]]
## A SpatialImage with 2 source(s):
##       > loaded
## grob: Av
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection2/spatial/tissue_lowres_image.png
##  url: NA

While grobs can be used directly for plotting (e.g. using grid.draw or ggplot2), when multiple images are to be stored (say, for many samples and of different resolutions), or when a SpatialExperiment is to be exported, the path and url provide the option to store an image’s source at minimal storage cost.

# retrieve 'SpatialImage' for 1st image entry & visualize it
si <- imgData(ve)$data[[1]]
grid::grid.draw(imgGrob(si))

5.2 Methods for image handling

The SpatialExperiment package provides various functions to handle which and how image data is stored in the object. These include:

  • loadImg to actively load the image (from a path or URL) and store it as a grob
  • unloadImg to drop the grob, while retaining the source path and/or URL
  • addImg to add an image entry (as a path, URL, or grob)
  • removeImg to drop an image entry entirely

Besides a path or URL to source the image from and a numeric scale factor, addImg() requires specification of the sample_id the new image belongs to, and an image_id that is not yet in use for that sample:

url <- "https://i.redd.it/3pw5uah7xo041.jpg"
ve <- addImg(ve, 
  sample_id = "section1", image_id = "pomeranian",
  imageSource = url, scaleFactor = NA_real_, load = TRUE)

grb <- imgGrob(ve, 
  sample_id = "section1", 
  image_id = "pomeranian")
grid::grid.draw(grb)

loadImg() and add/removeImg() are more flexible in the specification of the sample/image_id arguments. Specifically,

  • TRUE is equivalent to all, e.g. sample_id = "<sample>", image_id = TRUE will drop all images for a given sample.
  • NULL defaults to the first entry available, e.g., sample_id = "<sample>", image_id = NULL will drop the first image for a given sample.

For example, sample_id,image_id = TRUE,TRUE will specify all images; NULL,NULL corresponds to the first image entry in the imgData; TRUE,NULL equals the first image for all samples; and NULL,TRUE matches all images for the first sample.

Example 1: Unload all images, i.e., drop all grobs. As a result, grob slots will be set to NULL, and all SpatialImages now say > not loaded.

imgData(ve <- unloadImg(ve, sample_id = TRUE, image_id = TRUE))$data
## [[1]]
## A SpatialImage with 1 source(s):
##       > not loaded
## grob: NA
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection1/spatial/tissue_lowres_image.png
##  url: NA
## 
## [[2]]
## A SpatialImage with 1 source(s):
##       > not loaded
## grob: NA
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection2/spatial/tissue_lowres_image.png
##  url: NA
## 
## [[3]]
## A SpatialImage with 1 source(s):
##       > not loaded
## grob: NA
## path: NA
##  url: https://i.redd.it/3pw5uah7xo041.jpg

Example 2: Reload the first image for sample section2; the corresponding image now says grob: Av (for available) and > loaded.

imgData(ve <- loadImg(ve, "section2"))$data[[2]]
## A SpatialImage with 2 source(s):
##       > loaded
## grob: Av
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection2/spatial/tissue_lowres_image.png
##  url: NA

Example 3: Remove lowres image for section1 sample; the image is now completely gone from the imgData.

imgData(ve <- removeImg(ve, "section1", "pomeranian"))
## DataFrame with 2 rows and 6 columns
##     sample_id    image_id   data     width    height scaleFactor
##   <character> <character> <list> <integer> <integer>   <numeric>
## 1    section1      lowres              600       600   0.0516351
## 2    section2      lowres              600       600   0.0516351

5.3 Image caching

In general, paths take precedence over URLs, i.e. an image will only be loaded from the imgUrl if there is no imgPath available.

Images loaded from URLs will be cached, i.e. they are downloaded once and the file path where they were stored in will be stored under the SpatialImage’s imgPath:

imgData(loadImg(ve, TRUE, TRUE))$data[[2]]
## A SpatialImage with 2 source(s):
##       > loaded
## grob: Av
## path: /Library/Frameworks/R.framework/Versions/4.0/Resou
##       rces/library/SpatialExperiment/extdata/10xVisium/s
##       ection2/spatial/tissue_lowres_image.png
##  url: NA

For example, if we unload section1’s fullres image now and reload it, there won’t be another progress bar (the chunk somehow above doesn’t display one anyways… but it does in the console):

ve <- unloadImg(ve, TRUE, TRUE)
ve <- loadImg(ve, TRUE, TRUE)

6 colData replacement

While storing of sample_ids, the in_tissue indicator, and spatial xy_coords inside the SpatialExperiment’s colData enables directly accessibility via the colData and $ accessors, these fields are protected against arbitrary modification. This affects replacement operations to the following effects:

Renaming is generally not permitted:

names(colData(ve))[1] <- "a"

Replacement of sample_ids is permitted provided that

  1. the number of unique sample identifiers is retained
  2. newly provided sample identifiers are a one-to-one mapping
ve$sample_id <- sample(c("a", "b", "c"), ncol(ve), TRUE)
## Warning in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
## Overwriting
ve$sample_id <- sample(c("a", "b"), ncol(ve), TRUE)
## Warning in .local(x, ..., value): New 'sample_id's must map uniquely

Valid replacement will be propagated to the imgData:

tmp <- ve
i <- as.numeric(factor(ve$sample_id))
tmp$sample_id <- c("sample1", "sample2")[i]
imgData(tmp)
## DataFrame with 2 rows and 6 columns
##     sample_id    image_id   data     width    height scaleFactor
##   <character> <character> <list> <integer> <integer>   <numeric>
## 1     sample1      lowres              600       600   0.0516351
## 2     sample2      lowres              600       600   0.0516351

The in_tissue field may be modified provided that the former is a logical vector

ve$in_tissue <- "x"
## Warning in .local(x, ..., value): 'in_tissue' field in 'colData' should be
## 'logical'

colData(.) <- NULL will retain only the sample_id, in_tissue and xy_coords fields:

names(colData(ve))
## [1] "a"         "sample_id" "x_coord"   "y_coord"   "in_tissue" "array_row"
## [7] "array_col"
colData(ve) <- NULL
names(colData(ve))
## [1] "sample_id" "x_coord"   "y_coord"   "in_tissue" "array_row" "array_col"

7 Visualization

The DataFrame format enforced by the imgData field works nicely with ggplot2. Here’s a prototype of what a wrapper for plotting could look like that supports

ggspatial <- function(x, 
  fill=NULL, highlight=NULL, assay="logcounts", 
  sample_id=NULL, image_id=NULL, points=TRUE, images=TRUE) 
{
  # check validity of input arguments
  stopifnot(
    is.logical(points), length(points) == 1,
    is.logical(images), length(images) == 1)
    
  if (is.null(assay)) {
    assay <- assayNames(x)[1]
  } else {
    stopifnot(
      is.character(assay), 
      length(assay) == 1, 
      assay %in% assayNames(x))
  }
  
  if (is.null(sample_id)) {
    # default to using all samples
    samples <- unique(x$sample_id)
  } else {
    stopifnot(sample_id %in% x$sample_id)
    samples <- sample_id
  }
  # subset specified samples
  x <- x[, x$sample_id %in% samples]
  
  if (is.null(image_id)) {
    # default to first available image for each sample
    idx <- SpatialExperiment:::.get_img_idx(x, TRUE, NULL)
  } else {
    if (length(image_id) == 1) {
      idx <- SpatialExperiment:::.get_img_idx(x, TRUE, image_id)
    } else {
      stopifnot(length(image_id) == length(samples))
      idx <- mapply(s=samples, i=image_id,
        function(s, i) .get_img_idx(x, s, i))
    }
  }
  # subset specified images
  imgData(x) <- imgData(x)[idx, ]
  
  df <- as.data.frame(colData(x))
  
  # df <- as.data.frame(colData(x)) ## this must be changed with spatialCoords(x)
  xy <- grep("x_coord|y_coord", names(df))
  xy <- names(df)[xy] <- c("x", "y")
  
# scale spatial coordinates
  sfs <- setNames(scaleFactors(x, TRUE, TRUE), samples)
  df[, xy] <- as.matrix(df[, xy]) * sfs[df$sample_id]
  
  facets <- "sample_id"
  if (!is.null(fill)) {
    stopifnot(
      is.character(fill), 
      all(fill %in% rownames(x)) 
      || length(fill) == 1 & !is.null(x[[fill]]))
    if (all(fill %in% rownames(x))) {
      y <- assay(x, assay)
      y <- as.matrix(y[fill, ])
      df <- cbind(df, t(y))
      df <- pivot_longer(df, all_of(fill))
      fill <- "value"
      facets <- c(facets, "name")
    }
  }
  
  # in principle, images can be plotted using 'annotation_custom()'
  # however, this does not allow for faceting and we instead 
  # construct a separate image layer for each sample
  if (images) {
    # split 'imgData' by sample
    dfs <- split(imgData(x), imgData(x)$sample_id)
    # construct separate image layer for each sample
    images <- lapply(dfs, function(.) layer(
      data=as_tibble(.),
      inherit.aes=FALSE,
      stat="identity",
      position="identity",
      geom=ggplot2::GeomCustomAnn,
      params=list(
        grob=imgGrob(.$data[[1]]),
        xmin=0, xmax=.$width,
        ymin=0, ymax=.$height)))
  } else images <- NULL
  
  if (points) {
    guide <- ifelse(is.numeric(df[[fill]]), guide_colorbar, guide_legend)
    points <- list(
      guides(fill=guide(
        title=ifelse(fill == "value", assay, fill), 
        order=1, override.aes=list(col=NA, size=3))),
      geom_point(shape=21, size=1.25, stroke=0.25))
    if (!is.null(highlight)) {
      txt <- sprintf("with(df,%s)", highlight)
      df$highlight <- eval(parse(text=txt))
      highlights <- list(
        scale_color_manual(highlight, values=c("transparent", "black")),
        guides(col=guide_legend(override.aes=list(
          size=2, stroke=1, col=c("grey", "black")))))
    } else {
      df$highlight <- "transparent"
      highlights <- scale_color_identity()
    }
  } else {
    # this is required, else the image layer doesn't show
    points <- geom_point(col="transparent")
    highlights <- NULL
  }

  if (!is.null(facets)) {
    if (length(facets) == 1) {
      facets <- facet_wrap(facets)
    } else {
      ns <- vapply(df[facets], function(.) length(unique(.)), numeric(1))
      if (ns[2] > ns[1]) facets <- rev(facets)
      facets <- reformulate(facets[1], facets[2])
      facets <- list(
        facet_grid(facets, switch="y"),
        theme(strip.text.y.right = element_text(angle=90)))
    }
  } else facets <- NULL
  
  ggplot(df, 
    aes_string("x", "y", fill=fill, col="highlight")) +
    images + points + highlights + facets +
    coord_equal(expand=FALSE) + theme_void() +
    theme(legend.key.size=unit(0.5, "lines"))
}

Here are some example visualizations with ggspatial():

# read10xCounts gives a 'DelayedMatrix'
# which doesn't work with 'scater'
assay(ve) <- as(assay(ve), "matrix")

# normalization & quality control
ve <- logNormCounts(ve)
ve <- addPerFeatureQC(ve)
ve <- addPerCellQC(ve)
# plotting images only
ggspatial(ve, points=FALSE)

# both samples colored by total counts with 'in_tissue' spots highlighted
ggspatial(ve, fill="total", highlight="in_tissue", images=FALSE) + 
  scale_fill_viridis_c(trans="log10")

# expression of some arbitrary highly expressed genes across samples
gs <- rownames(ve)[rowData(ve)$mean > 250]
ggspatial(ve[, isInTissue(ve)], fill=gs) + scale_fill_viridis_c()

Session information

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] tidyr_1.1.2                 SpatialExperiment_0.99.4   
##  [3] scater_1.18.3               SingleCellExperiment_1.12.0
##  [5] SummarizedExperiment_1.20.0 Biobase_2.50.0             
##  [7] GenomicRanges_1.42.0        GenomeInfoDb_1.26.2        
##  [9] IRanges_2.24.0              S4Vectors_0.28.1           
## [11] BiocGenerics_0.36.0         MatrixGenerics_1.2.0       
## [13] matrixStats_0.57.0          ggplot2_3.3.2              
## [15] dplyr_1.0.2                 BiocStyle_2.18.1           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6              bit64_4.0.5              
##  [3] httr_1.4.2                tools_4.0.3              
##  [5] R6_2.5.0                  irlba_2.3.3              
##  [7] HDF5Array_1.18.0          vipor_0.4.5              
##  [9] DBI_1.1.0                 colorspace_2.0-0         
## [11] rhdf5filters_1.2.0        withr_2.3.0              
## [13] tidyselect_1.1.0          gridExtra_2.3            
## [15] bit_4.0.4                 curl_4.3                 
## [17] compiler_4.0.3            BiocNeighbors_1.8.2      
## [19] DelayedArray_0.16.0       labeling_0.4.2           
## [21] bookdown_0.21             scales_1.1.1             
## [23] rappdirs_0.3.1            stringr_1.4.0            
## [25] digest_0.6.27             rmarkdown_2.5            
## [27] R.utils_2.10.1            XVector_0.30.0           
## [29] pkgconfig_2.0.3           htmltools_0.5.0          
## [31] sparseMatrixStats_1.2.0   limma_3.46.0             
## [33] dbplyr_2.0.0              rlang_0.4.9              
## [35] RSQLite_2.2.1             DelayedMatrixStats_1.12.1
## [37] farver_2.0.3              generics_0.1.0           
## [39] BiocParallel_1.24.1       R.oo_1.24.0              
## [41] RCurl_1.98-1.2            magrittr_2.0.1           
## [43] BiocSingular_1.6.0        GenomeInfoDbData_1.2.4   
## [45] scuttle_1.0.3             Matrix_1.2-18            
## [47] Rcpp_1.0.5                ggbeeswarm_0.6.0         
## [49] munsell_0.5.0             Rhdf5lib_1.12.0          
## [51] viridis_0.5.1             lifecycle_0.2.0          
## [53] R.methodsS3_1.8.1         stringi_1.5.3            
## [55] yaml_2.2.1                edgeR_3.32.0             
## [57] zlibbioc_1.36.0           rhdf5_2.34.0             
## [59] BiocFileCache_1.14.0      grid_4.0.3               
## [61] blob_1.2.1                dqrng_0.2.1              
## [63] crayon_1.3.4              lattice_0.20-41          
## [65] beachmat_2.6.2            magick_2.5.2             
## [67] locfit_1.5-9.4            knitr_1.30               
## [69] pillar_1.4.7              rjson_0.2.20             
## [71] codetools_0.2-18          glue_1.4.2               
## [73] evaluate_0.14             BiocManager_1.30.10      
## [75] vctrs_0.3.5               gtable_0.3.0             
## [77] purrr_0.3.4               assertthat_0.2.1         
## [79] xfun_0.19                 rsvd_1.0.3               
## [81] DropletUtils_1.10.1       viridisLite_0.3.0        
## [83] tibble_3.0.4              beeswarm_0.2.3           
## [85] memoise_1.1.0             ellipsis_0.3.1